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Abstract — Hyperspectral image is a substitution of more than 
a hundred images, called bands, of the same region. They are 
taken at juxtaposed frequencies. The reference image of the 
region is called Ground Truth map (GT). the problematic is 
how to find the good bands to classify the pixels of regions; 
because the bands can be not only redundant, but a source 
of confusion, and decreasing so the accuracy of classification. 
Some methods use Mutual Information (MI) and threshold, to 
select relevant bands. Recently theres an algorithm selection 
based on mutual information, using bandwidth rejection and a 
threshold to control and eliminate redundancy. The band top 
ranking the MI is selected, and if its neighbors have sensibly the 
same MI with the GT, they will be considered redundant and so 
discarded. This is the most inconvenient of this method, because 
this avoids the advantage of hyperspectral images:: some precious 
information can be discarded. In this paper well make difference 
between useful and useless redundancy. A band contains useful 
redundancy if it contributes to decreasing error probability. 
According to this scheme, we introduce new algorithm using also 
mutual information, but it retains only the bands minimizing 
the error probability of classification. To control redundancy, 
we introduce a complementary threshold. So the good band 
candidate must contribute to decrease the last error probability 
augmented by the threshold. This process is a wrapper strategy; 
it gets high performance of classification accuracy but it is 
expensive than filter strategy. 

Index Terms — Hyperspectral images, classification, feature se- 
lection, error probability, redundancy. 



the scene (Hughes phenomenon) [10]. We can reduce the 
dimensionality of hyperspectral images by selecting only 
the relevant bands (feature selection or subset selection 
methodology), or extracting, from the original bands, new 
bands containing the maximal information about the classes, 
using any functions, logical or numerical (feature extraction 
methodology) [11] [9]. Here we focus on the feature selection 
using mutual information. Hyperspectral images have three 
advantages regarding the multispectral images [6], see Figure. 1 

First: the hyperspectral image contains more than a 
hundred images but the multispectral contains three at ten 
images. 

Second: hyperspectral image has a spectral resolution (the 
central wavelength divided by de width of spectral band) 
about a hundred, but that of multispectral is about ten. 
Third: the bands of a hyperspectral image is regularly spaced, 
those of multispectral image is large and irregularly spaced. 
Comment: when we reduce hyperspectral images 
dimensionality, we must save the precision and high 
discrimination of substances given by hyperspectral image. 



I. Introduction 

In the feature classification domain, the choice of data 
affects widely the results. For the Hyperspectral image, 
the bands dont all contain the information; some bands 
are irrelevant like those affected by various atmospheric 
effects, see Figure.4, and decrease the classification accuracy. 
And there exist redundant bands to complicate the learning 
system and product incorrect prediction [14]. Even the 
bands contain enough information about the scene they 
may cant predict the classes correctly if the dimension of 
space images, see Figure. 3, is so large that needs many 
cases to detect the relationship between the bands and 



A/. 



Intensity 



Monochrome Multispectral Hyperspectral 



wavelength 



Intensity 



f wavelength 
Reflectance 



II 



bancte spectra I e 



^<10 




wavelength 

> 1 

bandes spectra les 1 



wavelength 



Fig. 1. Precision an dicrimination added by hyperspectral images 



In this paper we use the Hyperspectral image AVIRIS 
92AV3C (Airborne Visible Infrared Imaging Spectrometer). 
[2]. It contains 220 images taken on the region "Indiana Pine" 
at "north-western Indiana", USA [1]. The 220 called bands 
are taken between 0.4m and 2.5m. Each band has 145 lines 
and 145 columns. The ground truth map is also provided, 
but only 10366 pixels are labeled fro 1 to 16. Each label 
indicates one from 16 classes. The zeros indicate pixels how 
are not classified yet, Figure. 2. 
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Fig. 2. The Ground Truth map of AVIRIS 92AV3C and the 16 classes 

The hyperspectral image AVIRIS 92AV3C contains num- 
bers between 955 and 9406. Each pixel of the ground truth 
map has a set of 220 numbers (measures) along the hyperspec- 
tral image. This numbers (measures) represent the reflectance 
of the pixel in each band. So the pixel is shown as vector 
off 220 components. Figure. 3 shows the vector pixels notion 
[7]. So reducing dimensionality means selecting only the 
dimensions caring a lot of information regarding the classes. 



A. Definition of mutual information 

This is a measure of exchanged information between tow 
ensembles of random variables A and B : 
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Fig. 3. The notion of pixel vector 

We can also note that not all classes are carrier of infor- 
mation. In Figure. 4, for example, we can show the effects of 
atmospheric affects on bands: 155, 220 and other bands. This 
hyperspectral image presents the problematic of dimensional- 
ity reduction. 

II. Mutual information based selection 

In this paragraph we inspect a recent method called band 
selection scheme using mutual information, and a rejection 
bandwidth algorithm to eliminate redundancy [3]. [7]. 



I(A,B) 
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Considering the ground truth map, and bands as ensembles 
of random variables, we calculate their interdependence. 

Geo [3] uses also the average of bands 170 to 210, to 
product an estimated ground truth map, and use it instead of 
the real truth map. Their curves are similar. Figure 4. 

B. Bands selection using mutual information 

From the mutual information curve, we can make threshold, 
and we retain the bands that have mutual information value 
above threshold. But the adjacent bands are possibly redun- 
dant. Geo [3] propose an algorithm to eliminate redundancy. 
Its described in [3] as following: "Let B m be the band that 
maximizes the mutual information. And N the number of B m 
neighboring bands. We define: 

d(n) = A(M/(n) - MI(n - 1)) 

If max n d(n) is down to a threshold only, B m is retained", 
i.e. its N neighbors will be discarded, because they my be 
redundante . Let X be the number of bands to be selected. At 
some point in the selection process, let S be the set of selected 
bands, and let R be the set of remaining bands. We initialize 
the process with SS= and R=l,2„,220. 

Algorithm 1 Proposed dimentionality Reduction and Redun- 
dancy control 

while \SS\ < X do 

Select band index s S=argmax s MI(s) 

Neighbours set N={n\n = S - (B + 1), , S, , S + B} 

if max s d(n) < threshold then 

SS <- SS U S and R <- R \ SS \ N 
else 

SS <- SS U S and R <- R \ SS 
end if 
end while 

For more details refer to [3]. 

C. Discussion and critics of method 

This algorithm is applied at mutual information calculated 
with the estimated ground truth map Figure.4. Like at [3] 
50The most inconvenient of this method is how it measures 
redundancy: small values of d(n)=(MI(n)-MI(n-l)) doesnt 
necessary an expression of redundancy. Its seed at [3] that 
the neighboring bands are possibly redundant. So with this 
method, the advantage (the precision viewed at section I) of 
hyperspectral images regarding the multispectral images is 
avoid, because the precious information can be avoided. 



D. Partial conclusion 

In this section we inspect the effectiveness of mutual 
information based selection for hyperspectral images. In the 
next step we use also the mutual information. 

One inconvenient if this filter approach is the time made to 
adjust manually the parameters. It can be expensive. But the 
most inconvenient is t hat reduce redundancy by eliminating 
the precision given by the notion of hyperspectral images. We 
propose now an algorithm avoiding only useless redundancy. 
We apply this rule: "If a band decreases the error probability, 
it will be retained even if it contains redundant information". 
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Fig. 4. Mutual information of AVIRIS with the Ground Truth map (solid line) 
and with the ground apporoximated by averaging bands 170 to 210 (dashed 
line) . 

III. The mesure of error probability 
A. Inequality of Fano 

Here we inspect the inequality of Fano [8]: 

H{C/X) - 1 ^ H(C/X) 

Log 2 (N c ) ~ e ~ Log 2 

with : 

H(C/X) - 1 _ H(C)-I(C;X) - 1 
Log 2 (N c ) ~ Log 2 (N c ) 

and : 

^ < H(C)-I(C;X) = H(C/X) 
' ~ Log 2 Log 2 

The expression of conditional entropy H( C/X) is calculated 
between the ground truth map (i.e. the classes C) and the 
subset of bands candidates X. Nc is the number of classes. So 
when the features X have a higher value of mutual information 
with the ground truth map, (is more near to the ground truth 
map), the error probability will be lower. But its difficult to 
compute this conjoint mutual information I(C;X), regarding 
the high dimensionality [14] .This propriety makes Mutual 



Information a good criterion to measure resemblance between 
too bands, like its exploited in section II. Furthermore, we will 
interest at case of one feature candidate X. 
Corollary: for one feature X, as X approaches the ground truth 
map, the interval P e is very small. 

B. Algorithm based on inequality of Fano 

Our idea is based on this observation: the band that has 
higher value of Mutual Information with the ground truth 
map can be a good approximation of it. So we note that the 
subset of selected bands are the good ones, if thy can generate 
an estimated reference map, sensibly equal the ground truth 
map. Its clearly thats an Incremental Wrapper-based Subset 
Selection (IWSS) approach[16] [13]. 

Our process of band selection will be as following: we order 
the bands according to value of its mutual information with 
the ground truth map. Then we initialize the selected bands 
ensemble with the band that has highest value of MI. At a 
point of process, we build an approximated reference map 
C est with the already selected bands, and we put it instead 
of X for computing the error probability (P e )\ the latest band 
added (at those already selected) must make P e decreasing, if 
else it will be discarded from the ensemble retained. Then we 
introduce a complementary threshold 7\to control redundancy. 
So the band to be selected must make error probability less 
than ( P e - Th) , where P e is calculated before adding it. The 
algorithm following shows more detail of the process: 

Let SS be the ensemble of bands already selected and 
S the band candidate to be selected. Build estimated^ '() ^ a 
procedure to construct the estimated reference map. P e is 
initialized with a value P* e . X the number of bands to be 
selected, SS is empty and R = 1..220. 



Algorithm 2 Proposed for Dimentionality Reduction and 
Redundancy control 

while \SS\ < X do 

Select band index s S=argmax s MI(s) 
SS <- SS U S and R <- R \ S 

C e st— Build estimated^ \S ' S) 

pe = H{ClC ea t) H(C/C eat )-l . 

Log 2 Log 2 (N c ) 

if Pe < Pe* - Th then 

Pe <- Pe* 
else 

SS <r- SS \ S 

end if 
end while 



C. Results and analysis 

We apply this algorithm on the hyperspectral image AVIRIS 
92AV3C [1], in the same conditions of section 2. 3. 

The procedure to construct the estimated reference map C est 
is the same SVM classifier used for classification. 



TABLE I 

Results illustrate elimination of Redundancy using 
algorithm based on inequality of fano, for thresholds (th) 
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Table. I shows the results obtained for several thresholds. 
We can see the effectiveness selection bands of our algorithm, 
and the important effect of avoiding redundancy. 

Figure. 5 shows more detail of the accuracy curves, versus 
number of bands retained, for several thresholds. This covers 
all behaviors of the algorithm: 

First: For the highest threshold values (0.1, 0.05, 0.03 and 
0.02) we obtain a hard selection: a few more informative bands 
are selected; the accuracy of classification is 90% with less 
than 20 bands selected. 

Second: For the medium threshold values (0.015, 0.012, 0.010, 
0.008, 0.006), some redundancy is allowed, in order to made 
increasing the classification accuracy. 

Tired: For the small threshold values (0.001 and 0), the redun- 
dancy allowed becomes useless, we have the same accuracy 
with more bands. 

Finally: for the negative thresholds, for example -0.01, we 
allow all bands to be selected, and we have no action of the 
algorithm. This corresponds at selection bay ordering bands 
on mutual information . The performance is low. 
We can not here that [15] uses two axioms to characterize fea- 
ture selection. Sufficiency axiom: the subset selected feature 
must be able to reproduce the training simples without losing 
information. The necessity axiom "simplest among different 
alternatives is preferred for prediction". In the algorithm 
proposed, reducing error probability between the truth map 
and the estimated minimize the information loosed for the 
samples training and also the predicate ones. 

We note also that we can use the number of features selected 
like condition to stop the search; so we can obtain an hybrid 
approach filter- wrapper [ 1 6] . 

Partial conclusion: the algorithm proposed is a very good 
method to reduce dimensionality of hyperspectral images. 

We illustrate in Figure .6, the Ground Truth map originally 
displayed, like at Figure .1, and the scene classified with our 
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Fig. 5. Accuracy of classification using the algorithm based on inequality 
of Fano, using numerous thresholds. 

TABLE II 

Accuracy of classification^) of each class for numerous 

THRESHOLDS (Th) 



Class Total Th 

pixels 0.00 0.001 0.008 0.015 0.020 0.030 



1 : 

2 : 

3 : 

4 : 

5 : 

6 : 

7 : 

8 : 

9 : 
10: 
11: 
12: 
13: 
14: 
15: 
16: 



method, for threshold as 0.03, so 18 bands selected. 

Table II indicates the classification accuracy of each class, for 

several thresholds. 
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Fig. 6. Original Grand Truth map(in the left) and the map produced bay 
our algorithm according to the threshold 0.03 i.e 18 bands (in the right). 
Acuracy=90%. 
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First :we can not the effectiveness of this algorithm for 
particularly the classes with a few number of pixels, for 
example class number 9. 

Second: we can not that 18 bands (i.e. threshold 0.03) are 
sufficient to detect materials contained in the region. Its also 
shown in Figure .6 

Tired: one of important comment is that most of class 
accuracy change lately when the threshold changes between 
0.03 and 0.015 

IV. Conclusion 

In this paper we presented the necessity to reduce the num- 
ber of bands, in classification of Hyperspectral images. Then 
we have comment results of a filtering redundancy mutual 
information based scheme. We carried out their effectiveness 
to select bands able to classify the pixels of ground truth. And 
also we have carried out their inconvenient as: the elimination 
of precision by discarding neighboring bands having sensibly 
the same mutual information with the ground truth map. We 
introduce an algorithm also based on mutual information and 
using a measure of error probability (inequality of Fano). To 
choice a band, it must contributes to reduce error probability. 
A complementary threshold is added to avoid redundancy. 
So each band retained has to contribute to reduce error 
probability by a step equal to threshold even if it caries a 
redundant information. We can tell that we conserve the useful 
redundancy by adjusting the complementary threshold. This 
algorithm is a feature selection methodology. But its a wrapper 
approach, because we use the classifier to make the estimated 
reference map. This is a limitation that must be avoided by 
searching another procedure to estimate reference map more 
rapidly, in order to implement it in a real time applications. 
This scheme is very interesting to investigate and improve, 
considering its performance. 
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